Optimal detection of changepoints with a linear computational cost

نویسندگان

  • R. Killick
  • P. Fearnhead
  • I. A. Eckley
چکیده

We consider the problem of detecting multiple changepoints in large data sets. Our focus is on applications where the number of changepoints will increase as we collect more data: for example in genetics as we analyse larger regions of the genome, or in finance as we observe time-series over longer periods. We consider the common approach of detecting changepoints through minimising a cost function over possible numbers and locations of changepoints. This includes several established procedures for detecting changing points, such as penalised likelihood and minimum description length. We introduce a new ∗R. Killick is Senior Research Associate, Department of Mathematics & Statistics, Lancaster University, Lancaster, UK (E-mail: [email protected]). P. Fearnhead is Professor, Department of Mathematics & Statistics, Lancaster University, Lancaster, UK (E-mail: [email protected]). I.A. Eckley is Senior Lecturer, Department of Mathematics & Statistics, Lancaster University, Lancaster, UK (E-mail: [email protected]). The authors are grateful to Richard Davis and Alice Cleynen for providing the Auto-PARM and PDPA software respectively. Part of this research was conducted whilst R. Killick was a jointly funded Engineering and Physical Sciences Research Council (EPSRC) / Shell Research Ltd graduate student at Lancaster University. Both I.A. Eckley and R. Killick also gratefully acknowledge the financial support of the EPSRC grant number EP/I016368/1. 1 ar X iv :1 10 1. 14 38 v3 [ st at .M E ] 9 O ct 2 01 2 method for finding the minimum of such cost functions and hence the optimal number and location of changepoints that has a computational cost which, under mild conditions, is linear in the number of observations. This compares favourably with existing methods for the same problem whose computational cost can be quadratic or even cubic. In simulation studies we show that our new method can be orders of magnitude faster than these alternative exact methods. We also compare with the Binary Segmentation algorithm for identifying changepoints, showing that the exactness of our approach can lead to substantial improvements in the accuracy of the inferred segmentation of the data.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

On optimal multiple changepoint algorithms for large data

Many common approaches to detecting changepoints, for example based on statistical criteria such as penalised likelihood or minimum description length, can be formulated in terms ofminimising a cost over segmentations. We focus on a class of dynamic programming algorithms that can solve the resulting minimisation problem exactly, and thus find the optimal segmentation under the given statistica...

متن کامل

The use of cumulative sums for detection of changepoints in the rate parameter of a Poisson Process

This paper studies the problem of multiple changepoints in rate parameter of a Poisson process. We propose a binary segmentation algorithm in conjunction with a cumulative sums statistic for detection of changepoints such that in each step we need only to test the presence of a simple changepoint. We derive the asymptotic distribution of the proposed statistic, prove its consistency and obtain ...

متن کامل

Detection of changes in variance using binary segmentation and optimal partitioning

This work explores the performance of binary segmentation and optimal partitioning in the context of detecting changes in variance for time-series. Both, binary segmentation and optimal partitioning, are based on cost functions that penalise a high amount of changepoints in order to avoid overfitting. Analysis is performed on simulated time-series; first on Normal data with constant but unknown...

متن کامل

Change detection from satellite images based on optimal asymmetric thresholding the difference image

As a process to detect changes in land cover by using multi-temporal satellite images, change detection is one of the practical subjects in field of remote sensing. Any progress on this issue increase the accuracy of results as well as facilitating and accelerating the analysis of multi-temporal data and reducing the cost of producing geospatial information. In this study, an unsupervised chang...

متن کامل

A New Method for Improving Computational Cost of Open Information Extraction Systems Using Log-Linear Model

Information extraction (IE) is a process of automatically providing a structured representation from an unstructured or semi-structured text. It is a long-standing challenge in natural language processing (NLP) which has been intensified by the increased volume of information and heterogeneity, and non-structured form of it. One of the core information extraction tasks is relation extraction wh...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012